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The use of biotnarkers in epidemiology is not new. buL 
recent developments In molecular biology and 
genetics have increased the opportunities for their 
use. However, epidemiological studies based on 
blomarkers. which belong CO the discipline defined 
as ’molecular epidemiology' are subject to the same 


problems of design and analysis as 'traditional' 
epidemiological studies. If biomarkers offer new 
opportunities to overcome some of the limitations 
of epidemiology, their added value over traditional 
approaches should be systematically assessed- Bio¬ 
markers should be validated though transitional 
studies; consideration to sources of bias and 
confounding in molecular epidemiology studios 
should, be uo less stringent than in traditional studies. 

Keywords; bias, biomarkers, confounding, molecu¬ 
lar epidemiology. Statistical power. 


Introduction 

The definition of molecular epidemiology applies to a 
discipline overlapping with both public health and 
experimental science. There is no universally 
accepted definition of which type of research fits 
into the discipline, and emphasis is given to different 
aspects that depend to a large extent on the 
background of the investigators Involved. So, for 
an epidemiologist, molecular epidemiology might 
include any epidemiological study involving the use 
of any biologically based measurement (including 
possibly measurement of blood pressure), whilst for 
a molecular biologist the search for a uew gene in a 
series of a few dozen patients would qualify as a 
molecular epidemiological study. This confusion 
may originate from the growing availability ol 
molecular (and more in general biological) ap¬ 
proaches to measure variables that might be 
relevant in epidemiology, and from the recognition 
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of the need to validate and apply them to carefully 
defined populations. 

Despite the ambiguities in the definition oi 
molecular epidemiology [1J. the definition has been 
successful anrl Is now widely used in names of 
departments, university courses and programmes, 
scientific meetings and professional societies. Several 
textbooks have also been published [2. 3], 

The use of biologically based approaches to 
measure variables of Interest in epidemiological 
scudies is not new. Two areas in which studies have 
been conducted for decades that would nowadays be 
Included under the rubric of molecular epidemiology 
are epidemiology of infectious diseases and cardio¬ 
vascular epidemiology. As an example. Fig. 1 shows 
Lhe results of one of the early study linking elevated 
serum cholesterol levels — which today might be 
classified as a biomarkcr or exposure - to risk of 
ischaemic heart disease in the Framingham cohort 
£4]- The recent expansion of studies based on 
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and the precision of the measurement of the 
biologically relevant exposure variables. In some 
cases, the btomarker-based measure of exposure 
represents an obvious improvement towards a better 
assessment of exposure. 

Aflatoxin represents a good example in which 
exposure biomarkers have represented a step for- 
* ward Jn the identification of human cancer hazards. 
The fungus Aspergillus flams is a common contami¬ 
nant or foodstuffs. in particular cereals and nuts, 
especially in West Africa and East Asia, hr certain 
storage conditions, it produces a toxin, called 
aflatoxin. which shows strong hcpatoxic and carci¬ 
nogenic properties in several animal models. Given 
the lack of evident colour, taste or smell of aflatoxin 
in processed food, there is no sensible way Indivi¬ 
duals can know whether the food they consume is 
contaminated. Studies on the carcinogenic effect of 
aflatoxim have therefore been limited by the difficulty 
tn determining exposure status at the individual 
level, although ecological study areas (e.g. villages) 
In which contamination was frequent had a higher 
occurrence of liver cancer than neighbouring areas 
with less frequent contamination. This situation 
changed with the identification of serum and urine 
biomarkers of aflatoxin exposure, namely urinary 
metabolites of aflatoxin itself and of its adducts 
formed with DNA. Table 2 reports the results of the 
first investigation that assessed the risk of liver 
cancer In subjects with samples collected and stored 
before the disease occurred. Individuals wiih any 
urinary marker of exposure had a 2.4-Iold Increased 
risk of liver cancer relative to individuals without 
markers; the relative risk was as high as 4.9 
amongst individuals positive Cor the urinary adduct 
degradation product AFBj-N 7 guanine [8J. It is 
noteworthy that urinary samples in this study were 
taken on average only two years before analysis, at 
a lime that might not be biologically relevant for 
liver cancer development. It is therefore conceivable 


T»M* Z Relative risk of liver cancer and exposure to xflaloric [8] 
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that aflatoxin exposure might have been mlsclassi- 
fied as compared with the biologically relevant 
exposure. Yet, the study provided strong evidence 
of a causal association between aflatoxin and liver 
cancer in exposed humans. 

Bias 

Three main types of bias are recognized in epide¬ 
miology, and all three may operate in biomarker- 
based studies [9]. Selection bias arism from lack of 
comparability of groups included in the study (e.g. 
cases and controls), for example, exposed cases 
might be more (or less) likely to participate In a 
study than exposed controls. Information bias 
Involves mlsclassillcadon of participants with re¬ 
spect to disease ur exposure Status. In biomarker- 
based studies, information bias encompasses the 
issues of validity, reproducibility and stability ol 
markers- Finally, confounding is a special form of 
bias, due to exposure to risk factors other than those 
under study (see below). 

Selection bias can be avoided by properly identify¬ 
ing the study population, and by optimizing the 
response rate. Furthermore, it can be controlled in 
the analysis by identifying factors that are related to 
selection and by controlling them as confounders. 
Unfortunately, many molecular epidemiological 
studies pay relatively little attention to the selection 
of participants. This Is particularly the case for 
studies of genetic factors, such as metabolic poly¬ 
morphisms. because it is considered that any 
selection of participants is unlikely to be related to 
the genetic factors under study. As an example, an 
association with lung cancer risk has been reported 
in early studies of polymorphism of the CYP2D6 
gene. Later studies, however, have not confirmed 
the finding, and rite early results are likely to have 
arisen from the use of improper control groups [10], 
In general, prospective studies, for example of the 
cohort design, are less prone to selection bias than 
retrospective studies, such us those based on a case- 
control comparison. 

Sources of variation, in blomarker-based measure¬ 
ments might arise from Intergroup (e.g. cases versus 
controls) variability: this is the phenomenon mole¬ 
cular epidemiological studies usually aim to address. 
However, other sources of variation exist that 
generate mis classification. Inter in dividual variability 
might be due to genetic or environmental factors 
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biomarkers has represented a. change in scientific 
paradigm in other areas of chronic disease epide¬ 
miology, notably in cancer epidemiology. 

It is useful to consider molecular epidemiology 
wi thin the framework of epidemiological studies In 
general. Epidemiology aims to identify determinants 
of disease (either risk factors or protective factors) 
and to quantify their role. This is done whilst taking 
into account (to the extent which is feasible) sources 
of random and systematic error (bias and confound¬ 
ing). as well as factors that modify the effect of the 
determinant of interest (Fig- 2)- As an example, 
epidemiological studies have shown an increased 
risk of cancer of the oral cavity amongst alcohol 
drinkers [5]. This relation has been statistically 
significant (i.e. random error has been excluded) in 
several large studies, and it has been confirmed that 
no matter how alcohol drinking and oral cancer 
were measured (i.c. information bias originating 
from misclasslllcadon of exposure and outcome has 
been excluded) and in different types of epidcmlolo- 



Fig, 2 Itkntlfleation of exposura-dtsca^c relation* with 
epidemiology. 


gical studies (l.e. casc-control and cohort studies, 
thus reducing the likelihood of bias from selection oi 
study subjects). Furthermore, the potential con¬ 
founding effect of tobacco smoking has been 
controlled in. various studies (e-g- by restricting the 
analysis to nonsmokers), and the susceptibility to 
alcohol-related oral cancer in different groups (e.g. 
possibly related to ethnicity) has also been studied. 

To a large extent, molecular epidemiological 
studies fit in the same framework: in other words, 
they consist of epidemiological studies Id which 
cither risk factors, outcomes, COtlfounderS or effect 
modifiers are measured with blomarkers. Similarly, 
the same arguments should be applied to judging 
the design, analysis and interpretation of these 
studies as in the case of 'traditional’ epidemiological 
studies. One exception is the class of 'transitional' 
studies, which represent a type of investigation 
specific to molecular epidemiology (see below). 

Biomarkers have been traditionally distinguished 
in markers or exposure, disease and susceptibility 
(Table 1). This distinction Is, however, somewhat 
arbitrary. For example, chromosomal aberrations 
have been used for decades to monitor exposure to 
environmental carcinogens [6). From this point of 
view, they can be classified as exposure biomarkers. 
However, recent evidence points towards a role of 
chromosomal aberrations In predicting cancer risk 
[7J in that subjects with an increased frequency of 
cedis with aberrations are at increased risk of cancer, 
independently from the agent they were exposed to. 
In this respect, they can be seen as early markers oi 
disease. 

Measuring with blomarkers 

The rationale for using blomarkers to measure 
exposure lies in the attempt to increase the validity 
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Table £ Source* of variation for selected blomarkert used in cancer epidemiology (adapted from Vine!-'- 1»?7 [111) 
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interacting with the variable under study. Iniraindi- 
vidual variability refers to components of variation 
such as daily variation lb hormonal level Finally 
measurement error might arise from sampling and 
laboratory variation. Table 3 provides some exam¬ 
ples of sources of variation for selected biomarkers 
used in molecular cancer epidemiology. 

Proper precautions should be taken to m in i m ize 
the sources of variation other than Inter group 
variability. The potential sources of such bias are 
numerous: the circumstances in which biological 
samples are taken, processed, scored and analysed: 
the technical aspects of the assays, etc. Il is 
important to ensure that, if all sources of variation 
cannot be controlled {as it is often the case), they 
should apply equally to the groups being compared. 
Therefore. iT long-term storage of samples might 
affect the measurement, it is important to match 
cases and controls in the study by duration oi 
sample storage, in such a case, misdassitication is 
said to be 'nondifferentiar {i.e. acting equally on the 
STOUP5 being compared). Non-differential misc-’assi- 
C cation invariably produces bias towards the null 
value, that is. it obscures an existing causal (or 
protective) association, but it does not generate 
false-positive results. On the other hand, a mis- 
classification that is ‘dillerential’ with respect to 
case-control (or exposed—unexposed) status gener¬ 
ates a bias La an unpredictable direction. For 
example, if there is substantial interbatch (or 
interreader) variability in the measurement, the 
inclusion of samples of cases and controls in different 
batches would generate differential Enisclussification, 
whilst h proper mix of samples in each batch would 
nt most result in nondilFerential inis classification 


(e.g. because results of samples from one batch tend 
to be systematically different from those of samples 
from another batch). 

Transitional studies 

The issue of variation in biomarker-based measure¬ 
ments Impinges on the need to validate biomarkers 
before their application In large-scale studies. This is 
the domain of so-called transitional studies, which 
aim to characterize the biomarkcr Itself rather than 
the underlying biological phenomenon. The aspects 
assessed by transitional studies include intra- and 
intersubject variability: feasibility of application of a 
bioiparker in field conditions (and optimization of its 
use); conicunctcrs and effect modifiers for the 
marker: and underlying biological mechanisms 
reflected by the market. 

Transitional studies usually Involve healthy in¬ 
dividuals, patients or subjects with specific exposures 
(c.g. groups of workers). Three types of transitional 
studies have been described in the continuum 
between development cd a new assay and its 
application in human populations (Tabic 4). Table 5 
presents the results of an interlaboratory compar¬ 
ison of measurements of DNA adducts using the 32 P- 
postlubelliug technique in blood samples from work¬ 
ers exposed to polycyclic aromatic hydrocarbons 
(PAHs) in foundries and unexposed subjects. The 
results suggest an important intcrlaboraioty varia¬ 
bility in this assay: lack of control for laboratory (i.e. 
a comparison of results amongst those exposed from 
laboratory 1 and amongst those not exposed from 
laboratory 2, or vice versa) would result in grossly 
biased results. Unfortunately, biomurkers are often 
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Table 4 Typo; of transitional studies (adapted tom Schulie & Fercra, 1997 [12]) 


Type of *tudy 

Aims 

Cbaruoertstics 

Developmental 

Development ot 
biomarfccr* 

Builds on cxperftocatiil studies 

Test aaay in human samples 

Evaluate biological sample collection, processing, storage 
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btonuiier kinetics. and potential coafouDdcis and ettect mod Kirn 

Applied 

Us* La cxoS5*3ectk)aala 
Eocubulic. panel studle* 

Bvralu^lion of exposure atdUis of various pcpolatforw mud lunher 
validation of Usc biomarkcr 


applied In the held without a proper characterization 
and validation, and therefore hamper the Interpreta¬ 
tion of the results. 

Confounding 

Confounding refers to a condition in which an 
observed association between a suspected risk factor 
and a disease is due to a diilcrent risk factor, which is 
a true cause of the disease. In a classical example, an 
association between tobacco smoking and cancer of 
the uterine cervix has been observed in many 
different populations. However, this association is 
likely to be confounded by infection with the human 
papilloma virus (HPV), which Is a cause of cervical 
cancer arid is associated with tobacco smoking (in 
the sense that In many populations smokers are 
more frequently positive for HPV than nonsmokers). 
The use of biomarkers does not prevent confounding 
from occurring, and It is important to consider 
confounding as an alternative explanation when 
associations arc observed. In the example above, 
infection with HPV would be a confounder of the 
association between tobacco smoking and cervical 
cancer no matter how smoking, infection and 
cervical cancer are assessed (via questionnaires. 


T:iblr, 5 Validation nludy ofDNA tuiducts In lomicxy workers and 
Controls ( 3a P-poS*labcllinjd - comparison of two lab (i 3[ 

Laboratory 

1 2 

exposed (n - 35) 26 ± 43 9.2 ± 23 

Uncxposed (it “ 6) 3.1 i 1.7 1.7 + U.7 


medical records, biochemical methods or molecular 
techniques). Furthermore, use of faiomarkers might 
introduce confounder®. Figure 3 presents the exam¬ 
ple of a study of occupational exposure to PAHs and 
lung cancer. Tobacco smoking might represent an 
alternative source of PAHs. of greater importance 
than occupational exposure. In such a cage, the 
results of the biomarker-based test will be driven by 
tobacco smoking rather than occupational exposure, 
even In the absence of an association between 
smoking and occupational exposure. 


Interaction 

Biomarkers have been widely applied to study gene- 
environment and gene-gene interactions tn the 
pathogenesis of cancer and other chronic diseases. 
In general, an interaction between a generic and an 
environmental factor can be studied using a 4-fbtd 
table as shown in Fig- 4. Individuals with the low- 
risk genetic trait and without the environmental 
exposure form the reference group, and the relative 
(or excess) risk is estimated in individuals with the 
high-risk gene, with the environ mental exposure 



Adduct* arc expressed pcrmilbon nucleotide*. Pjj, 3 Shxroplc of evdipundiap. 
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Fig. 4 LOKtactton between SB entnroamenial lartor (e) and tt 
genetic [actor (g). Rd reference category: RK. relative rislc 


and with both factors. Table 6 provides an example 
of a study Of lung cancer that addressed both 
tobacco smoking and genetic polymorphism for the 
gene encoding for the enzyme glutathtone-S-trans- 
ferase (GSTjMl. which, might be Implicated in the 
metabolism of tobacco carcinogens. The relative risk 
of lung cancer is higher in heavy smokers with the 
null genotype rhan in individuals with only one risk 
factor, suggesting an independent role of both 
tobacco smoking and GSTMl polymorphism. In 
particular, the combined relative risk (10.2) is 
intermediate between what would be expected 
according to an additive model of interaction 
(assuming that tobacco smoking and the poly¬ 
morphism act on different carcinogenic pathways, 
or RR„ — RR, +■ RR, — 1 = 2.5 + 7.8-1 = 9.3} and 
a multiplicative model (assuming that they act on 
the same pathway, or RR„ — 
RR, x RR. = 2.5 X 7-8 =* 19.5). It should be 
noted, however, that the Wide 95% confidence 
interval of the relative risk in the group with both 
factors (4.4—23.3) is compatible with both Interne* 


Tible 6 Interaction between tobacco tm akin ft arul giuczXhk>ae-5- 
transferase (CkTT) Ml pK/2yau?rphl£izi La lung cancer (adapted from 
Nakiicb! it al. 195 $ 11 - 4 ]) 
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the second row the relurtv* rlak and the third row the 95^4 
COQlidencc interval. 


lion models, which, stresses another methodological 
aspect of molecular epidemiological studies, namely 
the need lor a large sample size (see below). 

Molecular epidemiology studies addressing other 
types of interaction between two or more factors can 
be discussed according to the same paradigm used 
for geDe-envtronment interactions. For example, in 
the study of aflatoxin exposure and liver cancer 
mentioned above, the investigators addressed the 
possible interaction of alia toxin with hepatitis B 
virus (BBV). When compared to HBV-uegative 
subjects with aflatoxin exposure, the relative risk 
in HBV-posttive subjects who were also positive for 
aflatoxin markers was 60. which was greater than 
the product of the relative risks for the two factors 
separately (4.8 for HBV and 1.9 for aflatoxin). 
suggesting a synergism between aflatoxin and HBV 
In liver carcinogenesis. However, the wide confi¬ 
dence Interval in the group with both exposures 
(6.4—560. based on only seven cases and two 
controls) does not allow the rejection of the 
hypothesis of no interaction according to a multi¬ 
plicative model (4.8 X 1.9 = 9-1). 


Random error 

From several of the examples quoted above it is clear 
that a major problem in. biomarker-based epidemio¬ 
logical research is the insufficient number of subjects 
included in each Study. The main reasons for a small 
study lie with logistical and financial constraints. 
Indeed, any biomarker-based measure introduced in 
epidemiology should be compared with traditional 
approaches (e.g. the assessment of a given exposure 
using a biochemical or molecular method versus a 
questionnaire), aud the possible gain in sensitivity 
and specificity of the measurement should be 
considered in the light of the possible decrease in 
the number of study subjects. 

Many authors have proposed formulae to calcu¬ 
late the sample size needed to defect main effects and 
interactions amongst risk factors £9, 15]. However, 
most published studies do not include enough 
individuals, resulting in unstable and Often conflict¬ 
ing results. Recently, large-scale studies have started 
to be conducted (see, for example, a recent study ol 
colon cancer and polymorphism lor GSTMl and 
NAT2 genes, based on over 4000 cases and controls 
[16]. Another approach is the pooling of indepen- 
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dently conducted studies, as has been performed for 
studies of metabolic polymorphism and cancer [17]. 

Publication bias 

One characteristic of molecular epidemiology studies 
is the relatively large number of variables on 
exposures, disease and effect modifiers. Under these 
circumstances, the probability of generating by 
chance statistically significant results increases. It 
has been shown that there is a tendency to 
selectively report significant results, in particular 
when they show an effect in the expected direction. 
The net result is a biased reporting or positive over 
negative or null results. As an example, several 
studies have been conducted on polymorphism of 
the CYP2P6 gene, which encodes for an enzyme 
possibly involved in the activation of lung carcino¬ 
gens and lung cancer risk. Figure 5 shows the 
results of the 16 studies available for a recent meta- 
analysis [18], Results are reported in terms of 
logarithm of the relative risk for high-risk CYP2D6 
polymorphism and of its standard error. Each study 
is identified by one dot; 6tudies towards the right are 
smaller than studies toward the left, and studies 
towards the top arc more positive than studies 
towards the bottom. U no publication bias exists, ibe 
pattern, of such results should resemble a triangle (or 
a funnel), with larger studies converging On the left 
side around the central (‘true 1 ) value, and smaller 
studies symmetrically dispersed on the right side. 
However, the empty side on the bottom right corner 
of the graph suggests that smaller studies were more 
likely to be reported if they showed a. positive effect. 



a as l ts 2 


tig 5 Funnel plot of studies on CYP2D6 polymorphism *nd lung 
caoccr. 


A formal test confirms the presence of an asymme¬ 
trical distribution of results. 

It can be argued that such an initial report of 
false-positive results should not be considered a 
major scientific problem, since subsequent studies, 
aimed to replicate the early positive results! will 
eventually establish the truth- However, this ap¬ 
proach Is inefficient and represents an important 
waste of resources in particular in the case of studies 
based an expensive approaches, such as molecular 
epidemiological studies. For example, a study oi 
metabolic susceptibility reported that postmenopau¬ 
sal. smoking women with slow acetylation genotype 
for the N-acetyl-transferase 2 gene had an increased 
risk oi breast cancer, whilst this effect was not seen 
In nonsmoking women or in women with rapid 
acetylation genotype [19]. Given the absence of an. 
overall increased risk of breast cancer from tobacco 
smoking, the results were not very plausible. 
However, many subsequent studies were published 
on this topic that failed to confirm the association. 

A preferable approach consists of critically eval¬ 
uating and reporting results on the basis of criteria 
other than (or including but not limited to) 
statistical significance. Biological plausibility, possi¬ 
ble sources of bias and confounding, and number of 
tested associations are amongst such other criteria. 
Recently, statistical approaches have been proposed 
to take into account the possibility that significant 
results arc generated by chance when many 
comparisons are made [20]. 

Conclusions 

Almost 20 years have passed since the term 
‘molecular epidemiology’ was proposed [21]. It is 
clear that molecular techniques have found an 
important (and growing) role In epidemiological 
studies. So far, however, there are not many cases in 
which the application of a moiecularly or even a 
general biologically based approach has represented 
an enormous leap In the evidence brought by 
traditional epidemiological methods. Assessment of 
exposure to aflatcndns. enhanced sensitivity and 
specificity of assessment of past viral Infection, 
detection of protein and DNA adducts in workers 
exposed to reactive chemicals (such as ethylene 
oxide) arc amongst the examples in which mole¬ 
cular epidemiology has greatly contributed to the 
understanding of human cancer. In many other 
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cases, however, initial, promising results have not 
been confirmed by subsequent! usually methodolo¬ 
gically more sound, investigations. They include, in 
particular, the search for susceptibility to environ* 
mental carcinogen* by looking at polymorphism for 
metabolic enzymes [22]. 

If biomaricers offer new opportunities to overcome 
some of the limitations of epidemiology, their added 
value over traditional approaches should be system¬ 
atically assessed. Biomarkers should be validated: 
consideration of sources of bias and confounding in 
molecular epidemiology studies should be no less 
stringent than in other types of epidemiological 
studies. Similarly, other aspects of the study (e.g. 
determination of required sample size, statistical 
analysis, reporting and interpretation of results) 
should be approached with the same rigor as that 
used in epidemiology In general. When molecular 
epidemiological studies will be conducted according 
to state-of-the-art design, analysis and interpreta¬ 
tion, the discipline will have reached its maturity. 
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